Skip to content

CLN: small ops optimizations #28036

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 26, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

@jorisvandenbossche you've mentioned some ops optimizations; any suggestions for things to add here?

@@ -540,9 +540,6 @@ def _combine_frame(self, other, func, fill_value=None, level=None):
this, other = self.align(other, join="outer", level=level, copy=False)
new_index, new_columns = this.index, this.columns

if self.empty and other.empty:
return self._constructor(index=new_index).__finalize__(self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is covered by a test?

(in general, I would rather leave sparse alone, this is deprecated anyway)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it is.

I understand the impulse to leave sparse alone, but ATM some of it is a barrier to simplifying ops code

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I was just saying on the other PR as well, there is also the option to just leave sparse as is, but still go forward with simplifying the ops code for non-sparse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can also make it easier for you to focus only on the ops code for normal frame (without additional complexity of sharing with sparse)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand. core.ops functions call the DataFrame/SparseDataFrame methods, so anything I want to do to change the behavior of the non-sparse methods needs to be reflected in the sparse methods

@jorisvandenbossche
Copy link
Member

Do you have an idea how much those changes matter?

(anyway, even if it is almost negligible, I think eg checking the actual dtype with the is_..._dtype functions is better (more explicit) than passing the "container that has a dtype" so an improvement anyway)

you've mentioned some ops optimizations; any suggestions for things to add here?

Not directly related to the ones you are doing here. I need to look it up again (I described it in the issue about the slowdown), but it was mainly about the way we iterate through the columns and how to reconstruct the DataFrame/Series back in the end, not so much the ops itself.

@jbrockmendel
Copy link
Member Author

Do you have an idea how much those changes matter?

Haven't measured these specifically, but I know the is_foo_dtype checks get about a 2x speedup from passing dtype #27224

@jbrockmendel
Copy link
Member Author

reverted sparse changes, except for one cosmetic thing that annoyed me

@jbrockmendel
Copy link
Member Author

updated to change if is_extension_array_dtype(right) and not is_scalar(right) to reverse the order of the two checks. if we happen to have a str, the is_extension_array_dtype(right) can be fairly expensive

@@ -5318,7 +5325,7 @@ def _arith_op(left, right):

def _combine_match_index(self, other, func, level=None):
left, right = self.align(other, join="outer", axis=0, level=level, copy=False)
assert left.index.equals(right.index)
# at this point we have `left.index.equals(right.index)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we want to remove the asserts here? If someone is that set on getting this optimization couldn't they just use the -OO flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i put these assertions in when trying to understand the ops code a while back. having a comment to the same effect performs the same task for the reader.

@WillAyd WillAyd added the Clean label Aug 26, 2019
@WillAyd WillAyd added this to the 1.0 milestone Aug 26, 2019
@WillAyd WillAyd merged commit 3577746 into pandas-dev:master Aug 26, 2019
@WillAyd
Copy link
Member

WillAyd commented Aug 26, 2019

Thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the smallopts branch August 27, 2019 00:19
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants